<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
                "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
  <title>Description of v_specsub</title>
  <meta name="keywords" content="v_specsub">
  <meta name="description" content="V_SPECSUB performs speech enhancement using spectral subtraction [SS,ZO]=(S,FSZ,P)">
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  <meta name="generator" content="m2html &copy; 2003 Guillaume Flandin">
  <meta name="robots" content="index, follow">
  <link type="text/css" rel="stylesheet" href="../m2html.css">
</head>
<body>
<a name="_top"></a>
<div><a href="../index.html">Home</a> &gt;  <a href="index.html">v_mfiles</a> &gt; v_specsub.m</div>

<!--<table width="100%"><tr><td align="left"><a href="../index.html"><img alt="<" border="0" src="../left.png">&nbsp;Master index</a></td>
<td align="right"><a href="index.html">Index for v_mfiles&nbsp;<img alt=">" border="0" src="../right.png"></a></td></tr></table>-->

<h1>v_specsub
</h1>

<h2><a name="_name"></a>PURPOSE <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="box"><strong>V_SPECSUB performs speech enhancement using spectral subtraction [SS,ZO]=(S,FSZ,P)</strong></div>

<h2><a name="_synopsis"></a>SYNOPSIS <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="box"><strong>function [ss,gg,tt,ff,zo]=v_specsub(si,fsz,pp) </strong></div>

<h2><a name="_description"></a>DESCRIPTION <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="fragment"><pre class="comment">V_SPECSUB performs speech enhancement using spectral subtraction [SS,ZO]=(S,FSZ,P)

 Usage: (1) y=v_specsub(x,fs);   % enhance the speech using default parameters

 Inputs:
   si      input speech signal
   fsz     sample frequency in Hz
           Alternatively, the input state from a previous call (see below)
   pp      algorithm parameters [optional]

 Outputs:
   ss        output enhanced speech
   gg(t,f,i) selected time-frequency values (see pp.tf below)
   tt        centre of frames (in seconds)
   ff        centre of frequency bins (in Hz)
   zo        output state (or the 2nd argument if gg,tt,ff are omitted)

 The algorithm operation is controlled by a small number of parameters:

        pp.of          % overlap factor = (fft length)/(frame increment) [2]
        pp.ti          % desired frame increment [0.016 seconds]
        pp.ri          % set to 1 to round ti to the nearest power of 2 samples [0]
        pp.g           % subtraction domain: 1=magnitude, 2=power [1]
        pp.e           % gain exponent [1]
        pp.am          % max oversubtraction factor [3]
        pp.b           % max noise attenutaion in power domain [0.01]
        pp.al          % SNR for oversubtraction=am (set this to Inf for fixed a) [-5 dB]
        pp.ah          % SNR for oversubtraction=1 [20 dB]
        pp.ne          % noise estimation: 0=min statistics, 1=MMSE [0]
        pp.bt          % threshold for binary gain or -1 for continuous gain [-1]
        pp.mx          % input mixture gain [0]
        pp.gh          % maximum gain for noise floor [1]
        pp.rf          % round output signal to an exact number of frames [0]
        pp.tf          % selects time-frequency planes to output in the gg() variable ['g']
                           'i' = input power spectrum
                           'I' = input complex spectrum
                           'n' = noise power spectrum
                           'g' = gain
                           'o' = output power spectrum
                           'O' = output complex spectrum

 Following [1], the magnitude-domain gain in each time-frequency bin is given by
                          gain=mx+(1-mx)*max((1-(a*N/X)^(g/2))^(e/g),min(gh,(b*N/X)^(e/2)))
 where N and X are the powers of the noise and noisy speech respectively.
 The oversubtraction factor varies linearly between a=am for a frame SNR of al down to
 a=1 for a frame SNR of ah. To obtain a fixed value of a for all values of SNR, set al=Inf.
 Common exponent combinations are:
                      g=1  e=1    Magnitude Domain spectral subtraction
                      g=2  e=1    Power Domain spectral subtraction
                      g=2  e=2    Wiener filtering
 Many authors use the parameters alpha=a^(g/2), beta=b^(g/2) and gamma2=e/g instead of a, b and e
 but this increases interdependence amongst the parameters.
 If bt&gt;=0 then the max(...) expression above is thresholded to become 0 or 1.

 In addition it is possible to specify parameters for the noise estimation algorithm
 which implements reference [2] or [3] according to the setting of pp.ne

 Minimum statistics noise estimate [2]: pp.ne=0 
        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
        pp.qeqmin    % (23): minimum value of Qeq [2]
        pp.qeqmax    % max value of Qeq per frame [14]
        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
        pp.td        % time to take minimum over [1.536 seconds]
        pp.nu        % number of subwindows to use [3]
        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]

 MMSE noise estimate [3]: pp.ne=1 
        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)
        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)
        pp.psthr    % threshold for smoothed speech probability [0.99] (24)
        pp.pnsaf    % noise probability safety value [0.01] (24)
        pp.pspri    % prior speech probability [0.5] (18)
        pp.asnr     % active SNR in dB [15] (18)
        pp.psini    % initial speech probability [0.5] (23)
        pp.tavini   % assumed speech absent time at start [0.064 seconds]

 If convenient, you can call v_specsub in chunks of arbitrary size. Thus the following are equivalent:

                   (a) y=v_specsub(s,fs);

                   (b) [y1,z]=v_specsub(s(1:1000),fs);
                       [y2,z]=v_specsub(s(1001:2000),z);
                       y3=v_specsub(s(2001:end),z);
                       y=[y1; y2; y3];

 If the number of output arguments is either 2 or 5, the last partial frame of samples will
 be retained for overlap adding with the output from the next call to v_specsub().

 See also v_ssubmmse() for an alternative gain function

 Refs:
    [1] M. Berouti, R. Schwartz and J. Makhoul
        Enhancement of speech corrupted by acoustic noise
        Proc IEEE ICASSP, 1979, 4, 208-211
    [2] Rainer Martin.
        Noise power spectral density estimation based on optimal smoothing and minimum statistics.
        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.
    [3] Gerkmann, T. &amp; Hendriks, R. C.
        Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay
        IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393</pre></div>

<!-- crossreference -->
<h2><a name="_cross"></a>CROSS-REFERENCE INFORMATION <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
This function calls:
<ul style="list-style-image:url(../matlabicon.gif)">
<li><a href="v_enframe.html" class="code" title="function [f,t,w]=v_enframe(x,win,hop,m,fs)">v_enframe</a>	V_ENFRAME split signal up into (overlapping) frames: one per row. [F,T]=(X,WIN,HOP)</li><li><a href="v_estnoiseg.html" class="code" title="function [x,zo]=v_estnoiseg(yf,tz,pp)">v_estnoiseg</a>	V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp)</li><li><a href="v_estnoisem.html" class="code" title="function [x,zo,xs]=v_estnoisem(yf,tz,pp)">v_estnoisem</a>	V_ESTNOISEM - estimate noise spectrum using minimum statistics</li><li><a href="v_irfft.html" class="code" title="function x=v_irfft(y,n,d)">v_irfft</a>	V_IRFFT    Inverse fft of a conjugate symmetric spectrum X=(Y,N,D)</li><li><a href="v_rfft.html" class="code" title="function y=v_rfft(x,n,d)">v_rfft</a>	V_RFFT     Calculate the DFT of real data Y=(X,N,D)</li></ul>
This function is called by:
<ul style="list-style-image:url(../matlabicon.gif)">
</ul>
<!-- crossreference -->


<h2><a name="_source"></a>SOURCE CODE <a href="#_top"><img alt="^" border="0" src="../up.png"></a></h2>
<div class="fragment"><pre>0001 <a name="_sub0" href="#_subfunctions" class="code">function [ss,gg,tt,ff,zo]=v_specsub(si,fsz,pp)</a>
0002 <span class="comment">%V_SPECSUB performs speech enhancement using spectral subtraction [SS,ZO]=(S,FSZ,P)</span>
0003 <span class="comment">%</span>
0004 <span class="comment">% Usage: (1) y=v_specsub(x,fs);   % enhance the speech using default parameters</span>
0005 <span class="comment">%</span>
0006 <span class="comment">% Inputs:</span>
0007 <span class="comment">%   si      input speech signal</span>
0008 <span class="comment">%   fsz     sample frequency in Hz</span>
0009 <span class="comment">%           Alternatively, the input state from a previous call (see below)</span>
0010 <span class="comment">%   pp      algorithm parameters [optional]</span>
0011 <span class="comment">%</span>
0012 <span class="comment">% Outputs:</span>
0013 <span class="comment">%   ss        output enhanced speech</span>
0014 <span class="comment">%   gg(t,f,i) selected time-frequency values (see pp.tf below)</span>
0015 <span class="comment">%   tt        centre of frames (in seconds)</span>
0016 <span class="comment">%   ff        centre of frequency bins (in Hz)</span>
0017 <span class="comment">%   zo        output state (or the 2nd argument if gg,tt,ff are omitted)</span>
0018 <span class="comment">%</span>
0019 <span class="comment">% The algorithm operation is controlled by a small number of parameters:</span>
0020 <span class="comment">%</span>
0021 <span class="comment">%        pp.of          % overlap factor = (fft length)/(frame increment) [2]</span>
0022 <span class="comment">%        pp.ti          % desired frame increment [0.016 seconds]</span>
0023 <span class="comment">%        pp.ri          % set to 1 to round ti to the nearest power of 2 samples [0]</span>
0024 <span class="comment">%        pp.g           % subtraction domain: 1=magnitude, 2=power [1]</span>
0025 <span class="comment">%        pp.e           % gain exponent [1]</span>
0026 <span class="comment">%        pp.am          % max oversubtraction factor [3]</span>
0027 <span class="comment">%        pp.b           % max noise attenutaion in power domain [0.01]</span>
0028 <span class="comment">%        pp.al          % SNR for oversubtraction=am (set this to Inf for fixed a) [-5 dB]</span>
0029 <span class="comment">%        pp.ah          % SNR for oversubtraction=1 [20 dB]</span>
0030 <span class="comment">%        pp.ne          % noise estimation: 0=min statistics, 1=MMSE [0]</span>
0031 <span class="comment">%        pp.bt          % threshold for binary gain or -1 for continuous gain [-1]</span>
0032 <span class="comment">%        pp.mx          % input mixture gain [0]</span>
0033 <span class="comment">%        pp.gh          % maximum gain for noise floor [1]</span>
0034 <span class="comment">%        pp.rf          % round output signal to an exact number of frames [0]</span>
0035 <span class="comment">%        pp.tf          % selects time-frequency planes to output in the gg() variable ['g']</span>
0036 <span class="comment">%                           'i' = input power spectrum</span>
0037 <span class="comment">%                           'I' = input complex spectrum</span>
0038 <span class="comment">%                           'n' = noise power spectrum</span>
0039 <span class="comment">%                           'g' = gain</span>
0040 <span class="comment">%                           'o' = output power spectrum</span>
0041 <span class="comment">%                           'O' = output complex spectrum</span>
0042 <span class="comment">%</span>
0043 <span class="comment">% Following [1], the magnitude-domain gain in each time-frequency bin is given by</span>
0044 <span class="comment">%                          gain=mx+(1-mx)*max((1-(a*N/X)^(g/2))^(e/g),min(gh,(b*N/X)^(e/2)))</span>
0045 <span class="comment">% where N and X are the powers of the noise and noisy speech respectively.</span>
0046 <span class="comment">% The oversubtraction factor varies linearly between a=am for a frame SNR of al down to</span>
0047 <span class="comment">% a=1 for a frame SNR of ah. To obtain a fixed value of a for all values of SNR, set al=Inf.</span>
0048 <span class="comment">% Common exponent combinations are:</span>
0049 <span class="comment">%                      g=1  e=1    Magnitude Domain spectral subtraction</span>
0050 <span class="comment">%                      g=2  e=1    Power Domain spectral subtraction</span>
0051 <span class="comment">%                      g=2  e=2    Wiener filtering</span>
0052 <span class="comment">% Many authors use the parameters alpha=a^(g/2), beta=b^(g/2) and gamma2=e/g instead of a, b and e</span>
0053 <span class="comment">% but this increases interdependence amongst the parameters.</span>
0054 <span class="comment">% If bt&gt;=0 then the max(...) expression above is thresholded to become 0 or 1.</span>
0055 <span class="comment">%</span>
0056 <span class="comment">% In addition it is possible to specify parameters for the noise estimation algorithm</span>
0057 <span class="comment">% which implements reference [2] or [3] according to the setting of pp.ne</span>
0058 <span class="comment">%</span>
0059 <span class="comment">% Minimum statistics noise estimate [2]: pp.ne=0</span>
0060 <span class="comment">%        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]</span>
0061 <span class="comment">%        pp.tamax     % (3): max smoothing time constant [0.392 seconds]</span>
0062 <span class="comment">%        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]</span>
0063 <span class="comment">%        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]</span>
0064 <span class="comment">%        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]</span>
0065 <span class="comment">%        pp.qeqmin    % (23): minimum value of Qeq [2]</span>
0066 <span class="comment">%        pp.qeqmax    % max value of Qeq per frame [14]</span>
0067 <span class="comment">%        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]</span>
0068 <span class="comment">%        pp.td        % time to take minimum over [1.536 seconds]</span>
0069 <span class="comment">%        pp.nu        % number of subwindows to use [3]</span>
0070 <span class="comment">%        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]</span>
0071 <span class="comment">%        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]</span>
0072 <span class="comment">%</span>
0073 <span class="comment">% MMSE noise estimate [3]: pp.ne=1</span>
0074 <span class="comment">%        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)</span>
0075 <span class="comment">%        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)</span>
0076 <span class="comment">%        pp.psthr    % threshold for smoothed speech probability [0.99] (24)</span>
0077 <span class="comment">%        pp.pnsaf    % noise probability safety value [0.01] (24)</span>
0078 <span class="comment">%        pp.pspri    % prior speech probability [0.5] (18)</span>
0079 <span class="comment">%        pp.asnr     % active SNR in dB [15] (18)</span>
0080 <span class="comment">%        pp.psini    % initial speech probability [0.5] (23)</span>
0081 <span class="comment">%        pp.tavini   % assumed speech absent time at start [0.064 seconds]</span>
0082 <span class="comment">%</span>
0083 <span class="comment">% If convenient, you can call v_specsub in chunks of arbitrary size. Thus the following are equivalent:</span>
0084 <span class="comment">%</span>
0085 <span class="comment">%                   (a) y=v_specsub(s,fs);</span>
0086 <span class="comment">%</span>
0087 <span class="comment">%                   (b) [y1,z]=v_specsub(s(1:1000),fs);</span>
0088 <span class="comment">%                       [y2,z]=v_specsub(s(1001:2000),z);</span>
0089 <span class="comment">%                       y3=v_specsub(s(2001:end),z);</span>
0090 <span class="comment">%                       y=[y1; y2; y3];</span>
0091 <span class="comment">%</span>
0092 <span class="comment">% If the number of output arguments is either 2 or 5, the last partial frame of samples will</span>
0093 <span class="comment">% be retained for overlap adding with the output from the next call to v_specsub().</span>
0094 <span class="comment">%</span>
0095 <span class="comment">% See also v_ssubmmse() for an alternative gain function</span>
0096 <span class="comment">%</span>
0097 <span class="comment">% Refs:</span>
0098 <span class="comment">%    [1] M. Berouti, R. Schwartz and J. Makhoul</span>
0099 <span class="comment">%        Enhancement of speech corrupted by acoustic noise</span>
0100 <span class="comment">%        Proc IEEE ICASSP, 1979, 4, 208-211</span>
0101 <span class="comment">%    [2] Rainer Martin.</span>
0102 <span class="comment">%        Noise power spectral density estimation based on optimal smoothing and minimum statistics.</span>
0103 <span class="comment">%        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.</span>
0104 <span class="comment">%    [3] Gerkmann, T. &amp; Hendriks, R. C.</span>
0105 <span class="comment">%        Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay</span>
0106 <span class="comment">%        IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393</span>
0107 
0108 <span class="comment">%      Copyright (C) Mike Brookes 2004</span>
0109 <span class="comment">%      Version: $Id: v_specsub.m 10865 2018-09-21 17:22:45Z dmb $</span>
0110 <span class="comment">%</span>
0111 <span class="comment">%   VOICEBOX is a MATLAB toolbox for speech processing.</span>
0112 <span class="comment">%   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html</span>
0113 <span class="comment">%</span>
0114 <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span>
0115 <span class="comment">%   This program is free software; you can redistribute it and/or modify</span>
0116 <span class="comment">%   it under the terms of the GNU General Public License as published by</span>
0117 <span class="comment">%   the Free Software Foundation; either version 2 of the License, or</span>
0118 <span class="comment">%   (at your option) any later version.</span>
0119 <span class="comment">%</span>
0120 <span class="comment">%   This program is distributed in the hope that it will be useful,</span>
0121 <span class="comment">%   but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
0122 <span class="comment">%   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the</span>
0123 <span class="comment">%   GNU General Public License for more details.</span>
0124 <span class="comment">%</span>
0125 <span class="comment">%   You can obtain a copy of the GNU General Public License from</span>
0126 <span class="comment">%   http://www.gnu.org/copyleft/gpl.html or by writing to</span>
0127 <span class="comment">%   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.</span>
0128 <span class="comment">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%</span>
0129 <span class="keyword">if</span> numel(si)&gt;length(si)
0130     error(<span class="string">'Input speech signal must be a vector not a matrix'</span>);
0131 <span class="keyword">end</span>
0132 <span class="keyword">if</span> isstruct(fsz)
0133     fs=fsz.fs;
0134     qq=fsz.qq;
0135     qp=fsz.qp;
0136     ze=fsz.ze;
0137     s=zeros(length(fsz.si)+length(si(:)),1); <span class="comment">% allocate space for speech</span>
0138     s(1:length(fsz.si))=fsz.si;
0139     s(length(fsz.si)+1:end)=si(:);
0140 <span class="keyword">else</span>
0141     fs=fsz;     <span class="comment">% sample frequency</span>
0142     s=si(:);
0143     <span class="comment">% default algorithm constants</span>
0144 
0145     qq.of=2;   <span class="comment">% overlap factor = (fft length)/(frame increment)</span>
0146     qq.ti=16e-3;   <span class="comment">% desired frame increment (16 ms)</span>
0147     qq.ri=0;       <span class="comment">% round ni to the nearest power of 2</span>
0148     qq.g=1;        <span class="comment">% subtraction domain: 1=magnitude, 2=power</span>
0149     qq.e=1;        <span class="comment">% gain exponent</span>
0150     qq.am=3;      <span class="comment">% max oversubtraction factor</span>
0151     qq.b=0.01;      <span class="comment">% noise floor</span>
0152     qq.al=-5;       <span class="comment">% SNR for maximum a (set to Inf for fixed a)</span>
0153     qq.ah=20;       <span class="comment">% SNR for minimum a</span>
0154     qq.bt=-1;       <span class="comment">% suppress binary masking</span>
0155     qq.ne=0;        <span class="comment">% noise estimation: 0=min statistics, 1=MMSE [0]</span>
0156     qq.mx=0;        <span class="comment">% no input mixing</span>
0157     qq.gh=1;        <span class="comment">% maximum gain</span>
0158     qq.tf=<span class="string">'g'</span>;      <span class="comment">% output the gain time-frequency plane by default</span>
0159     qq.rf=0;
0160     <span class="keyword">if</span> nargin&gt;=3 &amp;&amp; ~isempty(pp)
0161         qp=pp;      <span class="comment">% save for v_estnoisem call</span>
0162         qqn=fieldnames(qq);
0163         <span class="keyword">for</span> i=1:length(qqn)
0164             <span class="keyword">if</span> isfield(pp,qqn{i})
0165                 qq.(qqn{i})=pp.(qqn{i});
0166             <span class="keyword">end</span>
0167         <span class="keyword">end</span>
0168     <span class="keyword">else</span>
0169         qp=struct;  <span class="comment">% make an empty structure</span>
0170     <span class="keyword">end</span>
0171 <span class="keyword">end</span>
0172 <span class="comment">% derived algorithm constants</span>
0173 <span class="keyword">if</span> qq.ri
0174     ni=pow2(nextpow2(qq.ti*fs*sqrt(0.5)));
0175 <span class="keyword">else</span>
0176     ni=round(qq.ti*fs);    <span class="comment">% frame increment in samples</span>
0177 <span class="keyword">end</span>
0178 tinc=ni/fs;          <span class="comment">% true frame increment time</span>
0179 tf=qq.tf;
0180 rf=qq.rf || nargout==2 || nargout==5;            <span class="comment">% round down to an exact number of frames</span>
0181 ne=qq.ne;           <span class="comment">% noise estimation: 0=min statistics, 1=MMSE [0]</span>
0182 
0183 <span class="comment">% calculate power spectrum in frames</span>
0184 
0185 no=round(qq.of);                                   <span class="comment">% integer overlap factor</span>
0186 nf=ni*no;           <span class="comment">% fft length</span>
0187 w=sqrt(hamming(nf+1))'; w(end)=[]; <span class="comment">% for now always use sqrt hamming window</span>
0188 w=w/sqrt(sum(w(1:ni:nf).^2));       <span class="comment">% normalize to give overall gain of 1</span>
0189 <span class="keyword">if</span> rf&gt;0
0190     rfm=<span class="string">''</span>;                         <span class="comment">% truncated input to an exact number of frames</span>
0191 <span class="keyword">else</span>
0192     rfm=<span class="string">'r'</span>;
0193 <span class="keyword">end</span>
0194 [y,tt]=<a href="v_enframe.html" class="code" title="function [f,t,w]=v_enframe(x,win,hop,m,fs)">v_enframe</a>(s,w,ni,rfm);
0195 tt=tt/fs;                           <span class="comment">% frame times</span>
0196 yf=<a href="v_rfft.html" class="code" title="function y=v_rfft(x,n,d)">v_rfft</a>(y,nf,2);
0197 yp=yf.*conj(yf);        <span class="comment">% power spectrum of input speech</span>
0198 [nr,nf2]=size(yp);              <span class="comment">% number of frames</span>
0199 ff=(0:nf2-1)*fs/nf;
0200 <span class="keyword">if</span> isstruct(fsz)
0201     <span class="keyword">if</span> ne&gt;0
0202         [dp,ze]=<a href="v_estnoiseg.html" class="code" title="function [x,zo]=v_estnoiseg(yf,tz,pp)">v_estnoiseg</a>(yp,ze);       <span class="comment">% estimate the noise using MMSE</span>
0203     <span class="keyword">else</span>
0204         [dp,ze]=<a href="v_estnoisem.html" class="code" title="function [x,zo,xs]=v_estnoisem(yf,tz,pp)">v_estnoisem</a>(yp,ze);       <span class="comment">% estimate the noise using minimum statistics</span>
0205     <span class="keyword">end</span>
0206     ssv=fsz.ssv;
0207 <span class="keyword">else</span>
0208     <span class="keyword">if</span> ne&gt;0
0209         [dp,ze]=<a href="v_estnoiseg.html" class="code" title="function [x,zo]=v_estnoiseg(yf,tz,pp)">v_estnoiseg</a>(yp,tinc,qp);    <span class="comment">% estimate the noise using MMSE</span>
0210     <span class="keyword">else</span>
0211         [dp,ze]=<a href="v_estnoisem.html" class="code" title="function [x,zo,xs]=v_estnoisem(yf,tz,pp)">v_estnoisem</a>(yp,tinc,qp);    <span class="comment">% estimate the noise using minimum statistics</span>
0212     <span class="keyword">end</span>
0213     ssv=zeros(ni*(no-1),1);             <span class="comment">% dummy saved overlap</span>
0214 <span class="keyword">end</span>
0215 <span class="keyword">if</span> ~nr                                  <span class="comment">% no data frames</span>
0216     ss=[];
0217     gg=[];
0218 <span class="keyword">else</span>
0219     mz=yp==0;   <span class="comment">%  mask for zero power time-frequency bins (unlikely)</span>
0220     <span class="keyword">if</span> qq.al&lt;Inf
0221         ypf=sum(yp,2);
0222         dpf=sum(dp,2);
0223         mzf=dpf==0;     <span class="comment">% zero noise frames = very high SNR</span>
0224         af=1+(qq.am-1)*(min(max(10*log10(ypf./(dpf+mzf)),qq.al),qq.ah)-qq.ah)/(qq.al-qq.ah);
0225         af(mzf)=1;      <span class="comment">% fix the zero noise frames</span>
0226     <span class="keyword">else</span>
0227         af=repmat(qq.am,nr,1);
0228     <span class="keyword">end</span>
0229     <span class="keyword">switch</span> qq.g
0230         <span class="keyword">case</span> 1   <span class="comment">% magnitude domain subtraction</span>
0231             v=sqrt(dp./(yp+mz));
0232             af=sqrt(af);
0233             bf=sqrt(qq.b);
0234         <span class="keyword">case</span> 2   <span class="comment">% power domain subtraction</span>
0235             v=dp./(yp+mz);
0236             bf=qq.b;
0237         <span class="keyword">otherwise</span> <span class="comment">% arbitrary subtraction domain</span>
0238             v=(dp./(yp+mz)).^(0.5*qq.g);
0239             af=af.^(0.5*qq.g);
0240             bf=qq.b^(0.5*qq.g);
0241     <span class="keyword">end</span>
0242     af =repmat(af,1,nf2);       <span class="comment">% replicate frame oversubtraction factors for each frequency</span>
0243     mf=v&gt;=(af+bf).^(-1);        <span class="comment">% mask for noise floor limiting</span>
0244     g=zeros(size(v));           <span class="comment">% reserve space for gain matrix</span>
0245     eg=qq.e/qq.g;               <span class="comment">% gain exponent relative to subtraction domain</span>
0246     gh=qq.gh;
0247     <span class="keyword">switch</span> eg
0248         <span class="keyword">case</span> 1                          <span class="comment">% Normal case</span>
0249             g(mf)=min(bf*v(mf),gh);      <span class="comment">% never give a gain &gt; 1</span>
0250             g(~mf)=1-af(~mf).*v(~mf);
0251         <span class="keyword">case</span> 0.5
0252             g(mf)=min(sqrt(bf*v(mf)),gh);
0253             g(~mf)=sqrt(1-af(~mf).*v(~mf));
0254         <span class="keyword">otherwise</span>
0255             g(mf)=min((bf*v(mf)).^eg,gh);
0256             g(~mf)=(1-af(~mf).*v(~mf)).^eg;
0257     <span class="keyword">end</span>
0258     <span class="keyword">if</span> qq.bt&gt;=0
0259         g=g&gt;qq.bt;
0260     <span class="keyword">end</span>
0261     g=qq.mx+(1-qq.mx)*g;   <span class="comment">% mix in some of the input</span>
0262     se=(<a href="v_irfft.html" class="code" title="function x=v_irfft(y,n,d)">v_irfft</a>((yf.*g).',nf).').*repmat(w,nr,1);   <span class="comment">% inverse dft and apply output window</span>
0263     ss=zeros(ni*(nr+no-1),no);                      <span class="comment">% space for overlapped output speech</span>
0264     ss(1:ni*(no-1),end)=ssv;
0265     <span class="keyword">for</span> i=1:no
0266         nm=nf*(1+floor((nr-i)/no));  <span class="comment">% number of samples in this set</span>
0267         ss(1+(i-1)*ni:nm+(i-1)*ni,i)=reshape(se(i:no:nr,:)',nm,1);
0268     <span class="keyword">end</span>
0269     ss=sum(ss,2);
0270     <span class="keyword">if</span> nargout&gt;2 &amp;&amp; ~isempty(tf)
0271         gg=zeros(nr,nf2,length(tf));  <span class="comment">% make space</span>
0272         <span class="keyword">for</span> i=1:length(tf)
0273             <span class="keyword">switch</span> tf(i)
0274                 <span class="keyword">case</span> <span class="string">'i'</span>            <span class="comment">% 'i' = input power spectrum</span>
0275                     gg(:,:,i)=yp;
0276                 <span class="keyword">case</span> <span class="string">'I'</span>            <span class="comment">% 'i' = input power spectrum</span>
0277                     gg(:,:,i)=yf;
0278                 <span class="keyword">case</span> <span class="string">'n'</span>            <span class="comment">% 'n' = noise power spectrum</span>
0279                     gg(:,:,i)=dp;
0280                 <span class="keyword">case</span> <span class="string">'g'</span>            <span class="comment">% 'g' = gain</span>
0281                     gg(:,:,i)=g;
0282                 <span class="keyword">case</span> <span class="string">'o'</span>            <span class="comment">% 'o' = output power spectrum</span>
0283                     gg(:,:,i)=yp.*g.^2;
0284                 <span class="keyword">case</span> <span class="string">'O'</span>            <span class="comment">% 'o' = output power spectrum</span>
0285                     gg(:,:,i)=yf.*g;
0286             <span class="keyword">end</span>
0287         <span class="keyword">end</span>
0288     <span class="keyword">end</span>
0289 <span class="keyword">end</span>
0290 <span class="keyword">if</span> nargout==2 || nargout==5
0291     <span class="keyword">if</span> nr
0292         zo.ssv=ss(end-ni*(no-1)+1:end);    <span class="comment">% save the output tail for next time</span>
0293         ss(end-ni*(no-1)+1:end)=[];
0294     <span class="keyword">else</span>
0295         zo.ssv=ssv;  <span class="comment">%</span>
0296     <span class="keyword">end</span>
0297     zo.si=s(length(ss)+1:end);      <span class="comment">% save the tail end of the input speech signal</span>
0298     zo.fs=fs;                       <span class="comment">% save sample frequency</span>
0299     zo.qq=qq;                       <span class="comment">% save local parameters</span>
0300     zo.qp=qp;                       <span class="comment">% save v_estnoisem parameters</span>
0301     zo.ze=ze;                       <span class="comment">% save state of noise estimation</span>
0302     <span class="keyword">if</span> nargout==2
0303         gg=zo;                      <span class="comment">% 2nd of two arguments is zo</span>
0304     <span class="keyword">end</span>
0305 <span class="keyword">elseif</span> rf==0
0306     ss=ss(1:length(s));             <span class="comment">% trim to the correct length if not an exact number of frames</span>
0307 <span class="keyword">end</span>
0308 <span class="keyword">if</span> ~nargout &amp;&amp; nr&gt;0
0309     ffax=ff/1000;    ax=zeros(4,1);
0310     ax(1)=subplot(223);
0311     imagesc(tt,ffax,20*log10(g)');
0312     colorbar;
0313     axis(<span class="string">'xy'</span>);
0314     <span class="keyword">if</span> qq.al==Inf
0315         title(sprintf(<span class="string">'Filter Gain (dB): a=%.2g, b=%.3g'</span>,qq.am,qq.b));
0316     <span class="keyword">else</span>
0317         title(sprintf(<span class="string">'Filter Gain (dB): a=%.2g (%.0f to %.0fdB), b=%.3g'</span>,qq.am,qq.al,qq.ah,qq.b));
0318     <span class="keyword">end</span>
0319     xlabel(<span class="string">'Time (s)'</span>);
0320     ylabel(<span class="string">'Frequency (kHz)'</span>);
0321 
0322     ax(2)=subplot(222);
0323     imagesc(tt,ffax,10*log10(yp)');
0324     colorbar;
0325     axis(<span class="string">'xy'</span>);
0326     title(<span class="string">'Noisy Speech (dB)'</span>);
0327     xlabel(<span class="string">'Time (s)'</span>);
0328     ylabel(<span class="string">'Frequency (kHz)'</span>);
0329 
0330     ax(3)=subplot(224);
0331     imagesc(tt,ffax,10*log10(yp.*g.^2)');
0332     colorbar;
0333     axis(<span class="string">'xy'</span>);
0334     title(sprintf(<span class="string">'Enhanced Speech (dB): g=%.2g, e=%.3g'</span>,qq.g,qq.e));
0335     xlabel(<span class="string">'Time (s)'</span>);
0336     ylabel(<span class="string">'Frequency (kHz)'</span>);
0337 
0338     ax(4)=subplot(221);
0339     imagesc(tt,ffax,10*log10(dp)');
0340     colorbar;
0341     axis(<span class="string">'xy'</span>);
0342     title(<span class="string">'Noise Estimate (dB)'</span>);
0343     xlabel(<span class="string">'Time (s)'</span>);
0344     ylabel(<span class="string">'Frequency (kHz)'</span>);
0345     linkaxes(ax);
0346 <span class="keyword">end</span></pre></div>
<hr><address>Generated by <strong><a href="http://www.artefact.tk/software/matlab/m2html/">m2html</a></strong> &copy; 2003</address>
</body>
</html>